Goto

Collaborating Authors

 monotonicity condition



Group Retention when Using Machine Learning in Sequential Decision Making: the Interplay between User Dynamics and Fairness

Neural Information Processing Systems

Machine learning models developed from real-world data can inherit pre-existing bias in the dataset. When these models are used to inform decisions involving humans, it may exhibit similar discrimination against sensitive attributes (e.g., gender and race) [


Continuous Policy and Value Iteration for Stochastic Control Problems and Its Convergence

arXiv.org Artificial Intelligence

We introduce a continuous policy-value iteration algorithm where the approximations of the value function of a stochastic control problem and the optimal control are simultaneously updated through Langevin-type dynamics. This framework applies to both the entropy-regularized relaxed control problems and the classical control problems, with infinite horizon. We establish policy improvement and demonstrate convergence to the optimal control under the monotonicity condition of the Hamiltonian. By utilizing Langevin-type stochastic differential equations for continuous updates along the policy iteration direction, our approach enables the use of distribution sampling and non-convex learning techniques in machine learning to optimize the value function and identify the optimal control simultaneously.


Multi-Robot Planning for Filming Groups of Moving Actors Leveraging Submodularity and Pixel Density

arXiv.org Artificial Intelligence

Observing and filming a group of moving actors with a team of aerial robots is a challenging problem that combines elements of multi-robot coordination, coverage, and view planning. A single camera may observe multiple actors at once, and the robot team may observe individual actors from multiple views. As actors move about, groups may split, merge, and reform, and robots filming these actors should be able to adapt smoothly to such changes in actor formations. Rather than adopt an approach based on explicit formations or assignments, we propose an approach based on optimizing views directly. We model actors as moving polyhedra and compute approximate pixel densities for each face and camera view. Then, we propose an objective that exhibits diminishing returns as pixel densities increase from repeated observation. This gives rise to a multi-robot perception planning problem which we solve via a combination of value iteration and greedy submodular maximization. %using a combination of value iteration to optimize views for individual robots and sequential submodular maximization methods to coordinate the team. We evaluate our approach on challenging scenarios modeled after various kinds of social behaviors and featuring different numbers of robots and actors and observe that robot assignments and formations arise implicitly based on the movements of groups of actors. Simulation results demonstrate that our approach consistently outperforms baselines, and in addition to performing well with the planner's approximation of pixel densities our approach also performs comparably for evaluation based on rendered views. Overall, the multi-round variant of the sequential planner we propose meets (within 1%) or exceeds the formation and assignment baselines in all scenarios we consider.


Monotone deep Boltzmann machines

arXiv.org Artificial Intelligence

Deep Boltzmann machines (DBMs), one of the first ``deep'' learning methods ever studied, are multi-layered probabilistic models governed by a pairwise energy function that describes the likelihood of all variables/nodes in the network. In practice, DBMs are often constrained, i.e., via the \emph{restricted} Boltzmann machine (RBM) architecture (which does not permit intra-layer connections), in order to allow for more efficient inference. In this work, we revisit the generic DBM approach, and ask the question: are there other possible restrictions to their design that would enable efficient (approximate) inference? In particular, we develop a new class of restricted model, the monotone DBM, which allows for arbitrary self-connection in each layer, but restricts the \emph{weights} in a manner that guarantees the existence and global uniqueness of a mean-field fixed point. To do this, we leverage tools from the recently-proposed monotone Deep Equilibrium model and show that a particular choice of activation results in a fixed-point iteration that gives a variational mean-field solution. While this approach is still largely conceptual, it is the first architecture that allows for efficient approximate inference in fully-general weight structures for DBMs. We apply this approach to simple deep convolutional Boltzmann architectures and demonstrate that it allows for tasks such as the joint completion and classification of images, within a single deep probabilistic setting, while avoiding the pitfalls of mean-field inference in traditional RBMs.


A Multiple Parameter Linear Scale-Space for one dimensional Signal Classification

arXiv.org Artificial Intelligence

Scale-space filtering provides a powerful framework for the structural feature extraction, and classification and recognition of waveforms. It is based on convolving a signal with a one-parametric family of kernels and the convolutions can be used to construct certain trees to correspond to the original signal ([5,17,23,26,28]). In this article we solve the following important problems: (I) We construct a maximal set of kernels that allows us to construct trees and have the property that the signals with the same shape result in equivalent trees. It turns out that this maximal set of kernels is a set of pth frac tional derivatives of a Gaussian.


SAFFRON and LORD Ensure Online Control of the False Discovery Rate Under Positive Dependence

arXiv.org Machine Learning

Online testing procedures assume that hypotheses are observed in sequence, and allow the significance thresholds for upcoming tests to depend on the test statistics observed so far. Some of the most popular online methods include alpha investing, LORD++ (hereafter, LORD), and SAFFRON. These three methods have been shown to provide online control of the "modified" false discovery rate (mFDR). However, to our knowledge, they have only been shown to control the traditional false discovery rate (FDR) under an independence condition on the test statistics. Our work bolsters these results by showing that SAFFRON and LORD additionally ensure online control of the FDR under nonnegative dependence. Because alpha investing can be recovered as a special case of the SAFFRON framework, the same result applies to this method as well. Our result also allows for certain forms of adaptive stopping times, for example, stopping after a certain number of rejections have been observed. For convenience, we also provide simplified versions of the LORD and SAFFRON algorithms based on geometric alpha allocations.


Hyperparameter Tricks in Multi-Agent Reinforcement Learning: An Empirical Study

arXiv.org Artificial Intelligence

In recent years, multi-agent deep reinforcement learning has been successfully applied to various complicated scenarios such as computer games and robot swarms. We thoroughly study and compare the state-of-the-art cooperative multi-agent deep reinforcement learning algorithms. Specifically, we investigate the consequences of the "hyperparameter tricks" of QMIX and its improved variants. Our results show that: (1) The significant performance improvements of these variant algorithms come from hyperparameter-level optimizations in their open-source codes (2) After modest tuning and with no changes to the network architecture, QMIX can attain extraordinarily high win rates in all hard and super hard scenarios of StarCraft Multi-Agent Challenge (SMAC) and achieve state-of-the-art (SOTA). In this work, we proposed a reliable QMIX benchmark, which will be of great benefit to subsequent research. Besides, we proposed a hypothesis to explain the excellent performance of QMIX.